103 research outputs found

    Creation of public use files: lessons learned from the comparative effectiveness research public use files data pilot project

    Get PDF
    In this paper we describe lessons learned from the creation of Basic Stand Alone (BSA) Public Use Files (PUFs) for the Comparative Effectiveness Research Public Use Files Data Pilot Project (CER-PUF). CER-PUF is aimed at increasing access to the Centers for Medicare and Medicaid Services (CMS) Medicare claims datasets through PUFs that: do not require user fees and data use agreements, have been de-identified to assure the confidentiality of the beneficiaries and providers, and still provide substantial analytic utility to researchers. For this paper we define PUFs as datasets characterized by free and unrestricted access to any user. We derive lessons learned from five major project activities: (i) a review of the statistical and computer science literature on best practices in PUF creation, (ii) interviews with comparative effectiveness researchers to assess their data needs, (iii) case studies of PUF initiatives in the United States, (iv) interviews with stakeholders to identify the most salient issues regarding making microdata publicly available, and (v) the actual process of creating the Medicare claims data BSA PUFs

    Creation of public use files: lessons learned from the comparative effectiveness research public use files data pilot project

    Get PDF
    In this paper we describe lessons learned from the creation of Basic Stand Alone (BSA) Public Use Files (PUFs) for the Comparative Effectiveness Research Public Use Files Data Pilot Project (CER-PUF). CER-PUF is aimed at increasing access to the Centers for Medicare and Medicaid Services (CMS) Medicare claims datasets through PUFs that: do not require user fees and data use agreements, have been de-identified to assure the confidentiality of the beneficiaries and providers, and still provide substantial analytic utility to researchers. For this paper we define PUFs as datasets characterized by free and unrestricted access to any user. We derive lessons learned from five major project activities: (i) a review of the statistical and computer science literature on best practices in PUF creation, (ii) interviews with comparative effectiveness researchers to assess their data needs, (iii) case studies of PUF initiatives in the United States, (iv) interviews with stakeholders to identify the most salient issues regarding making microdata publicly available, and (v) the actual process of creating the Medicare claims data BSA PUFs

    Avoiding disclosure of individually identifiable health information: a literature review

    Get PDF
    Achieving data and information dissemination without arming anyone is a central task of any entity in charge of collecting data. In this article, the authors examine the literature on data and statistical confidentiality. Rather than comparing the theoretical properties of specific methods, they emphasize the main themes that emerge from the ongoing discussion among scientists regarding how best to achieve the appropriate balance between data protection, data utility, and data dissemination. They cover the literature on de-identification and reidentification methods with emphasis on health care data. The authors also discuss the benefits and limitations for the most common access methods. Although there is abundant theoretical and empirical research, their review reveals lack of consensus on fundamental questions for empirical practice: How to assess disclosure risk, how to choose among disclosure methods, how to assess reidentification risk, and how to measure utility loss.public use files, disclosure avoidance, reidentification, de-identification, data utility

    Matriz de Contabilidad Social Tributaria 1997

    Get PDF
    This paper documents the features of a 1997 Social Accounting Matrix (SAM) with emphasis on tax accounts for Colombia. We present four different formats according to different definitions of the Good and Services Account: (i) product-product, (ii) activity-product, (iii) use and supply tables separated, and (iv) multiplier-analysis oriented. Using the latter we compare the impact of three identical exogenous shocks over three key sectors of the Colombian economy in 1997: the coffee sector, the oil sector and the gross fixed capital formation account. We find that the most beneficial shock regarding household income and government revenues comes from an exogenous shock on the coffee sector

    Avoiding disclosure of individually identifiable health information: a literature review

    Get PDF
    Achieving data and information dissemination without arming anyone is a central task of any entity in charge of collecting data. In this article, the authors examine the literature on data and statistical confidentiality. Rather than comparing the theoretical properties of specific methods, they emphasize the main themes that emerge from the ongoing discussion among scientists regarding how best to achieve the appropriate balance between data protection, data utility, and data dissemination. They cover the literature on de-identification and reidentification methods with emphasis on health care data. The authors also discuss the benefits and limitations for the most common access methods. Although there is abundant theoretical and empirical research, their review reveals lack of consensus on fundamental questions for empirical practice: How to assess disclosure risk, how to choose among disclosure methods, how to assess reidentification risk, and how to measure utility loss

    Inventario y criterios de gestión de los mamíferos del Parque Nacional de Ordesa y Monte Perdido

    Get PDF
    2 volúmenes + 1 vol. Anexos + Resumen.-- Informe Final del Convenio de Investigación entre el Organismo Autónomo de Parques Nacionales y el Instituto Pirenaico de Ecología (CSIC).Peer reviewe

    The Fourteenth Data Release of the Sloan Digital Sky Survey: First Spectroscopic Data from the extended Baryon Oscillation Spectroscopic Survey and from the second phase of the Apache Point Observatory Galactic Evolution Experiment

    Get PDF
    The fourth generation of the Sloan Digital Sky Survey (SDSS-IV) has been in operation since July 2014. This paper describes the second data release from this phase, and the fourteenth from SDSS overall (making this, Data Release Fourteen or DR14). This release makes public data taken by SDSS-IV in its first two years of operation (July 2014-2016). Like all previous SDSS releases, DR14 is cumulative, including the most recent reductions and calibrations of all data taken by SDSS since the first phase began operations in 2000. New in DR14 is the first public release of data from the extended Baryon Oscillation Spectroscopic Survey (eBOSS); the first data from the second phase of the Apache Point Observatory (APO) Galactic Evolution Experiment (APOGEE-2), including stellar parameter estimates from an innovative data driven machine learning algorithm known as "The Cannon"; and almost twice as many data cubes from the Mapping Nearby Galaxies at APO (MaNGA) survey as were in the previous release (N = 2812 in total). This paper describes the location and format of the publicly available data from SDSS-IV surveys. We provide references to the important technical papers describing how these data have been taken (both targeting and observation details) and processed for scientific use. The SDSS website (www.sdss.org) has been updated for this release, and provides links to data downloads, as well as tutorials and examples of data use. SDSS-IV is planning to continue to collect astronomical data until 2020, and will be followed by SDSS-V.Comment: SDSS-IV collaboration alphabetical author data release paper. DR14 happened on 31st July 2017. 19 pages, 5 figures. Accepted by ApJS on 28th Nov 2017 (this is the "post-print" and "post-proofs" version; minor corrections only from v1, and most of errors found in proofs corrected

    Genotype and phenotype landscape of MEN2 in 554 medullary thyroid cancer patients: the BrasMEN study

    Get PDF
    Multiple endocrine neoplasia type 2 (MEN2) is an autosomal dominant genetic disease caused by RET gene germline mutations that is characterized by medullary thyroid carcinoma (MTC) associated with other endocrine tumors. Several reports have demonstrated that the RET mutation profile may vary according to the geographical area. In this study, we collected clinical and molecular data from 554 patients with surgically confirmed MTC from 176 families with MEN2 in 18 different Brazili an centers to compare the type and prevalence of RET mutations with those from other countries. The most frequent mutations, classified by the number of families affected, occur in codon 634, exon 11 (76 families), followed by codon 918, exon 16 (34 families: 26 with M918T and 8 with M918V) and codon 804, exon 14 (22 families: 15 with V804M and 7 with V804L). When compared with other major published series from Europe, there are several similarities and some differences. While the mutations in codons C618, C620, C630, E768 and S891 present a similar prevalence, some mutations have a lower prevalence in Brazil, and others are found mainly in Brazil (G533C and M918V). These results reflect the singular proportion of European, Amerindian and African ancestries in the Brazilian mosaic genome83289298CONSELHO NACIONAL DE DESENVOLVIMENTO CIENTÍFICO E TECNOLÓGICO - CNPQCOORDENAÇÃO DE APERFEIÇOAMENTO DE PESSOAL DE NÍVEL SUPERIOR - CAPESFUNDAÇÃO DE AMPARO À PESQUISA DO ESTADO DE SÃO PAULO - FAPESPFUNDAÇÃO DE AMPARO À PESQUISA DO ESTADO DO RIO GRANDE DO SUL - FAPERGSSem informaçãoSem informação2006/60402-1; 2010/51547-1; 2013/01476-9; 2014/06570-6; 2009/50575-4; 2010/51546-5; 2012/21942-116/2551-0000482-

    The clustering of galaxies in the completed SDSS-III Baryon Oscillation Spectroscopic Survey: cosmological analysis of the DR12 galaxy sample

    Get PDF
    We present cosmological results from the final galaxy clustering data set of the Baryon Oscillation Spectroscopic Survey, part of the Sloan Digital Sky Survey III. Our combined galaxy sample comprises 1.2 million massive galaxies over an effective area of 9329 deg2deg^2 and volume of 18.7 Gpc3Gpc^3, divided into three partially overlapping redshift slices centred at effective redshifts 0.38, 0.51, and 0.61. We measure the angular diameter distance DM and Hubble parameter H from the baryon acoustic oscillation (BAO) method after applying reconstruction to reduce non-linear effects on the BAO feature. Using the anisotropic clustering of the pre-reconstruction density field, we measure the product DM*H from the Alcock-Paczynski (AP) effect and the growth of structure, quantified by fσ8(z)f{\sigma}8(z), from redshift-space distortions (RSD). We combine measurements presented in seven companion papers into a set of consensus values and likelihoods, obtaining constraints that are tighter and more robust than those from any one method. Combined with Planck 2015 cosmic microwave background measurements, our distance scale measurements simultaneously imply curvature ΩK=0.0003+/0.0026{\Omega}_K =0.0003+/-0.0026 and a dark energy equation of state parameter w = -1.01+/-0.06, in strong affirmation of the spatially flat cold dark matter model with a cosmological constant (Λ{\Lambda}CDM). Our RSD measurements of fσ8f{\sigma}_8, at 6 per cent precision, are similarly consistent with this model. When combined with supernova Ia data, we find H0 = 67.3+/-1.0 km/s/Mpc even for our most general dark energy model, in tension with some direct measurements. Adding extra relativistic species as a degree of freedom loosens the constraint only slightly, to H0 = 67.8+/-1.2 km/s/Mpc. Assuming flat Λ{\Lambda}CDM we find Ωm=0.310+/0.005{\Omega}_m = 0.310+/-0.005 and H0 = 67.6+/-0.5 km/s/Mpc, and we find a 95% upper limit of 0.16eV/c20.16 eV/c^2 on the neutrino mass sum

    Use of multidimensional item response theory methods for dementia prevalence prediction : an example using the Health and Retirement Survey and the Aging, Demographics, and Memory Study

    Get PDF
    Background Data sparsity is a major limitation to estimating national and global dementia burden. Surveys with full diagnostic evaluations of dementia prevalence are prohibitively resource-intensive in many settings. However, validation samples from nationally representative surveys allow for the development of algorithms for the prediction of dementia prevalence nationally. Methods Using cognitive testing data and data on functional limitations from Wave A (2001-2003) of the ADAMS study (n = 744) and the 2000 wave of the HRS study (n = 6358) we estimated a two-dimensional item response theory model to calculate cognition and function scores for all individuals over 70. Based on diagnostic information from the formal clinical adjudication in ADAMS, we fit a logistic regression model for the classification of dementia status using cognition and function scores and applied this algorithm to the full HRS sample to calculate dementia prevalence by age and sex. Results Our algorithm had a cross-validated predictive accuracy of 88% (86-90), and an area under the curve of 0.97 (0.97-0.98) in ADAMS. Prevalence was higher in females than males and increased over age, with a prevalence of 4% (3-4) in individuals 70-79, 11% (9-12) in individuals 80-89 years old, and 28% (22-35) in those 90 and older. Conclusions Our model had similar or better accuracy as compared to previously reviewed algorithms for the prediction of dementia prevalence in HRS, while utilizing more flexible methods. These methods could be more easily generalized and utilized to estimate dementia prevalence in other national surveys
    corecore